Your Search Engine Needs a Memory!

Eric Pugh and Stavros Macrakis • Location: Theater 7 • Back to Haystack 2024

Every search professional needs data about users’ behavior. Data is fundamental for analyzing user behavior and improving search relevance, both with manual tuning and with machine learning. User Behavior Insights system provides a standard way to do just that.

Business product managers, UX designers, and relevance engineers need data to understand their users: What do they search for? What do they click on? What do they act on or buy? How do they use facets and filters? How do they refine their queries within a session? Engineers need data to improve search relevance and effectiveness, both manually and using AI/ML.

But until now, collecting user behavior data has been haphazard. Proprietary SaaS search systems collect data but don’t share it with their customers. Open-source systems simply don’t offer data collection mechanisms. Third-party analytics systems are designed primarily to track page-to-page flow, and not the flow of search results through the system – and often make it difficult to extract the raw granular data needed for model training. Consequently many search teams develop and maintain their own ad hoc data collection systems and analysis tools.

Our open-source User Behavior Insights (UBI) system provides a client-side library for instrumenting web pages, a server-side library for collecting data, and analytical tools for understanding it. Critically, it defines a standard schema for behavior data so that the community can contribute additional analytical tools. We have also demonstrated its integration with personalization software.

With the emergence of even more ways of generating and ranking search results – neural dense search, neural sparse search, model fine-tuning, hybrid search, RAG, … – choosing the best mix of approaches for your search application becomes even more critical.

UBI is a call to action to the Search Relevance community to make it simpler to seamlessly track, in an ethical and safe manner, the steps of a user’s search journey in order to build the experiences that the future requires.

Eric Pugh

OpenSource Connections LLC

Eric Pugh is the co-founder and CEO of OpenSource Connections. Today he helps OSC’s clients, especially those in the ecommerce space, build their own search teams and improve their search maturity, both by leading projects and by acting as a trusted advisor. Fascinated by the craft of software development, Eric Pugh has been involved in the open source world as a developer, committer and user for the past fifteen years. He is a member of the Apache Software Foundation and an active committer to Apache Solr. He co-authored the book Apache Solr Enterprise Search Server, now on its third edition. He also stewards Quepid, an open source platform for assessing and improving your search relevance.

Stavros Macrakis

Amazon

Stavros Macrakis is the senior technical product manager for the OpenSearch Project, focusing on document and e-commerce search. He has worked on search at Lycos, FAST, GLG, Google, and AWS for almost 20 years and is passionate about search relevance.